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Abstract:   We  consider  a  particular  problem  which  arises  when 
applying  the  method  of  gradient  projection  for  solving  constrained 
optimization  and  finite  dimensional  variational  inequalities  on 
the  convex  set  formed  by  the  convex  hull  of  the  standard  basis 
unit  vectors.   The  method  is  especially  important  for  relaxation 
labeling  techniques  applied  to  problems  in  artificial  intelligence, 
Zoutendijk's  method  for  finding  feasible  directions,  which  is 
relatively  complicated  in  general  situations,  yields  a  very  simple 
finite  algorithm  for  this  problem.   We  present  an  extremely  simple 
algorithm  for  performing  the  gradient  projection  and  an  independent 
verification  of  its  correctness. 

Section  1.   Formulation  of  the  Projection  Problem. 
We  treat  the  following  optimization  problem: 
Let  IK  be  the  convex  set  defined  by 


n 

IK  =  {x  £  n"  I    I      x.  =  1   ,   X.  >  0,  V.  }. 
i=l   1  1       1 


For  any  vector   x  e  IK ,  the  tangent  set  T^  is  given  by 


T-v={veIR   I   y   v.=0,   v->0  v/henever  x.  =  0} 


The  set  of   feasible  directions  at  x  is  defined  by 


F-y  =  T^  n  {v  e  I^'^l   livll  <  1}, 
XX  '        — 
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where  B  •  II  is  the  standard  Euclidean  norm. 

Given  a  "current  point"  x  £  IF,  and  an  arbitrary  direction 
q  e  3r",  we  consider 

Problem  P :   Find  u  e  F^  such  that 


q«u  >  q-v    for  all  v  e  F->-. 


Clearly,  Problem  P  is  a  linear  opt imizatioB"^ problem  with 
quadratic  and  linear  constraints. 

Problem  P  arises  in  the  context  of  labeling  problems  in 
artificial  intelligence,  where  iterative  techniques  similar  to 
gradient  ascent  in  IK  have  been  studied  for  their  use  in  reducing 
ambiguity  and  achieving  consistency  [1,2].   The  convex  set  IK  is 
especially  appropriate  for  labeling  problems.   The  set  IK  can  be 
viewed  as  the  convex  hull  of  the  standard  unit  vectors 
e.  =  (0, 0, . . . ,1, 0, . .  .  , 0) ,  i  =  1,2, ...,n.   The  vector  e^  is  assigned 
to  an  object  to  denote  the  labeling  of  that  object  with  label 
number  i.   If  the  identity  of  the  object  is  ambiguous,  and  no  label 
can  be  assigned  with  complete  certainty,  a  compromise  vector  p  e  H' 
can  be  assigned  to  the  object,  so  that 

D  =  (a . , . . . ,a  )  =  Ya .e . 

'  denotes  the  labeling  of  that  object  with  label  numbers  1  through 
n,  with  degree  of  certainty  a,  through  a   respectively. 

The  optimization  problem  P  arises  in  a   solution  method 
proposed  for  solving  a  variational  inequality  on  IK  [3] .   It  also 
arises  if  one  is  solving  a  nonlinear  optimization  problem  on  n;  by 
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the  method  of  gradient  ascent  (steepest  ascent) .  To  motivate 
problem  P,  we  present  a  brief  discussion  of  the  latter  case. 

Consider  the  problem  of  maximizing  F  Cx)  among  all  x  e  3K  , 
where  F  is  a  real-valued  dif ferentiable  nonlinear  function.   If 
one  uses  the  method  of  gradent  ascent,  then  the  procedure  is  to 
update  successive  values  of  x  by  replacing  x  with  the  vector 
X  +  ctu,  where  a  is  a  small  positive  scalar,  and  u  is  chosen  so 
as  to  maximize  the  directional  derivative  at  x.   Of  course,  u  is 
constrained  by  the  requirement  that  it  must  lie  tangent  to  the 
space  IK  at  X  and  x  +  au  is  a  numerical  way  of  moving  x  infinite- 
simally  in  the  direction  u.   Further,  since  the  directional 
derivative  of  F  at  x,  D->F  Cx)  ,  is  scaled  by  the  magnitude  of  u, 
it  suffices  to  consider  directions  defined  by  vectors  of  unit 
length  or  less.   Since  D-^F  (x)  =  grad  F(x)*u,  u  can  be  found  by 
maximizing  q*u  among  all  vectors  u  e  F-^,  where  q  =  grad  F(x)  . 
Thus  u  solves  problem  P. 
Section  2.   Solution  methodology 

Problem  P  is  simply  the  problem  of  projecting  the  given  vector 
q   onto  the  convex  set  formed  by  the  tangent   set  T-,  and  then 
normalizing  the  length  of  the  result.   If  x   is  an  interior  po^nt 
in  IK,  (i.e.,  no  component  x^  is  zero),  then  T^  is  a  subspace,  and 
the  solution  u  is  simply  the  length-normalized  orthogonal  projection 
of  q  onto  the  subspace.   This  is  accomplished  by  the  trivial  formulas 


V  =  q  -  ^il      qi)-(l.l 1) 


v/Il  vl 
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Thus,  Problem  P  is  interesting  only  when  x  lies  on  a  face  or 
edge  of  IK.   Topologically ,  IK  is  a  simplex  of  degree  n  -  1,  and 
has  boundary  surfaces  of  all  lower   degrees.   However,  if  x  lies 
on  one  of  these  surfaces,  the  set  T^  is  a  convex  set  (shaped  like 
a  "wedge"),  and  the  solution  to  Problem  P  is  more  complicated 
than  projecting  onto  the  boundary  surface.   For  example,  if  q  lies 
in  T-^,  then  the  projection  is  simply  the  identity.   On  the   other 
hand,  if  q  lies  in  a  direction  that  points  away  from   all  tangent 
directions  in  the  wedge  T->-,  that  is,  if  q«v  <  0  for  all  v  e  T-»-, 
then  the  solution  u  is  the  zero  vector.   ]ri  between,  there  will 
be  regions  in  which  q  projects  to  boundary  surfaces  of  each  order 
greater  than   or  equal  to  the  order  of  the  boundary  en  which  x 
lies. 

Note  that  a  boundary  face  (or  edge  of  any  given  order)  is 
itself  a  simplex.   However,  the  solution  vector  u  is  not  necessarily 
a  simple  projection  of  q  onto  this  boundary  simplex,  as  noted  above. 

A  solution  method  exists,  and  can  be  obtained  by  applying 
algorithms  from  the  theory  of  feasible  directions  to  the  specific 
geometry  defined  by  IK .   In  particular,  Zoutendijk  offers  finite 
algorithms  [4]  for  solving  problems  of  the  form 

Maximize   q«u,      given  q, 
subject   to 

Au  <  b 


Problem  P  can  be  formulated  in  this  way,  with  B  =  I,  the  n  by  n 
identity  matrix.   When  Zoutendijk 's  algorithm  is  applied  to  Problem 
P,  certain  simplifications  can  be  applied  because  D  =  I  and  because 
IK  is  especially  easy  to  define  (i.e.,  i.he  matrix  A  has  a  verv 
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simple  form) .   These  simplifications  are  equivalent  to  an 
extremely  simple  finite  algorithm  for  solving  Problem  P,  which  we 
present  in  Section  3. 

Despite  the  fact  that  Zoutendijk's  algorithm  has  been  available 
for  over  twenty  years,  and  in  spite  of  the  algorithm's  simplicity 
when  applied  to   Problem  P,  alternative  schemes  are  commonly   used 
for  projecting  direction  vectors  q  onto  the  set  of  feasible  directions 
from  a  point  on  the  probability  space.   These  schemes  typically 
do  not  solve  Problem  P,  but  rather  yield  feasible  directions  u 
which  are  "more  or  less"  in  the  same  direction  as  q.   For  example, 
the  "nonlinear  probabilistic  model"  used  in  many  applications 
of  relaxation  techniques  to  labeling  problems  defines 


V  =  p' 


p^(l+aq^) 

where     p '  .  =  -r r^rz r 

^1    I    p     (1+aq  ) 

J 


and  where  a  >  0  Is  a  small  fixed  constant, 


It  is  easily  verified  that  v  t;  T->  as  long  as  a  is  sufficiently 
small.  Note,  hov/ever,  that  if  p.  =  0,  then  v.  =0.  That  is,  an 
iterative  scheme  based  on  these  formulas  can  never  leave  a  face 
or  edge  of  the  space  IK. 

Other  projection  schemes  have  been  studied  in  connection 
with  relaxation  labeling.   We  mention  the  Product  P.ule  [5], 
Bayesian  analysis  [6],  and  single  component  desaturization  [7]. 


There  is  also  an  obvious  "truncation  method"  of  setting  negative 
updating  components  to  zero  when  the  corresponding  x.'s  are  zero. 
Note  of  these  schemes  yields  a  solution  to  Problem  P.   The 
connections  between  relaxation  labeling,  projection  onto  a  convex 
set,  and  Problem  P  are  not  fully  addressed  in  the  cited  references. 
However,  in  an  accompanying  paper  by  Hummel  and  Zucker,  an 
algorithm  for  relaxation  labeling  is  presented  which  requires, 
as  a  subroutine,  an  algorithm  to  solve  Problem  P. 

Why  should  a  problem  which  seems  to  be  geometrically  simple 
lead  to  so  many  different  partial  solution  methods?   Part  of  the 
answer  is  related  to  the  formulation  of  the  problems :  the  need 
for  the  projection  operator  is  not  always  recognized.   More 
importantly,  the  geometry  is  not  quite  as  trivial  as  it  first 
seems.   Computing  the  regions  of  3R  "  in  which  q  is  projected  to 
different  order  boundary  surfaces  requires  a  lot  of  care.   The 
algorithm  presented  in  the  next  section  performs  this  computation. 

Section  3.   The  projection  Algorithm 

The  following  algorithm  solves  Problem  P. 

0.  Accept  as  input  x    e    IK     and   q  e  IR 

1.  Set   k  =  1,   Sj^  =  0  ,  D  =  {i|x.  =  0  }. 


2 .   LOOP : 


'^  •   '^-  *^k   i^S^^ 


:=   {i  £D  I  q.  <  t,  } 


If   S  ^^  =  Sj^,  then   EXIT  LOOP 


k  :=  k  +  1 
END  LOOP 


3.   Compute   y,  where 


0      if   i  e  S^ 
q.-t^   if   i  /  S, 


4.   Output  u,  where 

^    [O       if   y  =  0 
y/llyll   otherwise 

One  way  to  verify  that  the  resulting  vector  is  the 
correct  solution  is  to  observe  that  u  satisfies  the  Kuhn-Tucker 
conditions  (at  least  when  u  7^  0) ,  which  is  equivalent  to  solving 
the  constrained  optimization  problem  [8].   In  this  proof,  the 
Lagrange  multiplier  X  belonging  to  the  constraint  ^u.  =  0  has 
the  same  value  as  the  final  threshold  t,  .   We  now  present  an 
alternate  proof.   This  proof  is  self-contained,  and  handles 
the  u  =  0  and  u  7^  0  cases  uniformly. 

Before  showing  that  u  is  the  desired  solution,  however, 
we  show  that  the  algorithm  terminates  in  a  finite  number  of 
steps. 


Proposition;   The   Sj^'s  are  nested,  and  thus  the  algorithm 
terminates  with  at  most  #D  +  1  passes  through  the  loop. 
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Proof.   Since  S,  =  0  ,  S,  C  S^  .   We  assume  by  induction  that 
^k-l  £  ^v  •   Since   i  e  S,  implies  that  q^  <  tj^_^  , 

(n  -  #Sj^)t  =      I        q^   =      I  ^i  -   I         ^i 

i^'\       i^Vl       i-V^k-1 


("  -  *Vi)Vi  -  (*V*\-i)Vi  =  (^-#\)^k-i   ' 


so  t,  >_  t,  _,  .   Then  clearly  S,  c  S,  _|_,  ,  by  the  definition  of  the 
S,  's.   The  proposition  follows  since  the  loop,  terminates  when 

Section  4.   Proof  of  the  Algorithm 

Let  k  =  N  denote  the  maximum  value  of  k  attained  during 

the  last  iteration  through  the  loop  (step  2) .   Denote  by  W  the 
space 

W  =  {v  e  Tj  I  v^  =  0   for   i  e  S^}. 

Note  that  if   v  e  W  and   llvll  <  1,  then   v  e  F^  . 


Lemma.   The  maximum  of  q«v,  v  e  F->-  is  attained  among   v  e  W  n  f^, 


Proof:   Suppose  that  v    e   F^   maximizes  q«v,  and  that  v.  >  0  for 

X  1 


some  i  e  S   .   Define   w  by 


0     if    j  =  i 

Vj    if    J  e  Sj^\  {i} 


Vj  +  H-T-#s,,  -i  if  ^    ^   \ 


Since  V  e    F-^,  v.  >  0   for   j  e  D,  and  thus  w.  >  0  for  j  e  D. 
X    3  —  J    — 

Further,   Sw.  =  Zv.  =  0.   Finally, 


2v. 


J7^ 


1      2 


J,^^r--n-^^i*r^   J     v. 


^  -      2v.  ^  „ 

-  "^"   -  n  -  #S     ^    ^i  ^  "^"   ^  1  • 


Here  we  used   v^  >  0,  and  j  e  S   implies   v.  >  0.   Combining, 
we  have  shown  that   w  £  F^  .   But 

X 

■^     ^  V  V  1 

q'w=   )    q.v.+   )    q.(v.+  rrr;—  v.  ) 


N 


=  q-v  +  (t^-qi)v^  >  q-v  , 


since  i  e  S   implies  q.  <  t   ,  and  since  v^  >  0.   This  is  a 

contradiction,  since  q*v  is  maximum  among  v  e  F^ .   Thus  we 

must  have  v.  =  0  for  i  e  S„.   That  is,  v  e  W. 
1  N 


Theorem;   The  output  vector  u  solves  Problem  P. 


Proof:   According  to  the  lemma,  it  suffices  to  show  that  u  e  F^ , 
and  q*u  >_  q«v   for  all  v  e  W  n  f^. 
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Clearly,  y  calculated  in  step  3  of  the  algorithm  satisfies 
y.  =  0  for  i  e  Sj^.   Further,  if  i  e  D,  i   ft   S^,    then  q^  >^  t^ 
according  to  the  definition  of  S    ,    and  so  y .  =  q .  -  t^  >  0 .   So 
y.  >  0  for  all  i  e  D.   Finally   Ey ,  =  0  by  direct  computation. 
Thus  y  e  W. 

In  fact,  as  can  be  easily  seen,  step  3  merely  performs 
an  orthogonal  projection  of  q  onto  the  subspace  W.  That  is, 
for  any  v  e  W,  -v  -   .A  ^bxt:.'' 

(q  -  y)  .V  =  0.  ;oqo-  .TO.r.j;;..:. 

So  q«v  =  y«v   for  all  v  e  W. 

The  output  vector  u  calculated  in  step  4  is  simply  a  length 
normalization  of  y,  and  so  u  is  in  W  also.   Since  llull£  1,  u  e  ^• 

Next,  observe  that  since  u  c  W, 

q  •  u  =  y  •  u  =  II  y  II  . 

-»■ 
The  last  equality  follows  form  the  definition  of  u  in  step  4. 

Let  V  be  any  vector  in  W  ^^  F->-.    Then  II  vH  _<  1 ,  and  so 

q  •  V  =  y  •  V  <  II  y  II  •  II  V II  <  II  y  II  , 
using  the  Cauchy-Schwartz  inequality.   We  have  therefore  shown  that 

u  e  F-±    :  q*u_>q'V   for  all  v  e  W  n  f-*-  . 

This  proves  the  theorem. 
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